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I, Korbin Van Dyke, state that: 



Summary of My Opinions 

1 . I previously submitted a declaration in connection with this proceeding (see Van 
Dyke Declaration dated August 15, 2007), referenced hereinafter as the "initial Van Dyke 
declaration". For brevity, I will not repeat information set forth in the initial Van Dyke 
declaration in this declaration. 

2. In preparation of this declaration I have reviewed U.S. Patent Application Serial 
No. 10/757,516. I have also reviewed U.S. Patent Nos. 5,742,840 and 6,295,599 (respectively 
the '840 and '599 patents) that the 10/757,516 patent application indirectly claims priority to, as 
well as appendices to the '840 and '599 patents (the Terpsichore and Zeus System Architecture 
manuals, respectively, and hereinafter referred to respectively as the Terpsichore and the Zeus 
manuals). I have reviewed the Office Action for the 10/757,516 patent application mailed on 
October 30, 2007, including the paragraph on page 1 1 that discusses the Response to Arguments 
and particularly the Examiner's conclusion that the priority for the claimed invention does not 
extend to the '840 or the '599 patents, since limitations of claims 9, 18, 24, 25, 31, and 32 are not 
supported by the '840 or the '599 patents. My understanding is that the features of the claimed 
invention are taught and supported by complying with the written description requirement and 
the enablement requirement. My understanding of the written description requirement is that a 
patent disclosure must describe the claimed invention in sufficient detail that one of ordinary 
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skill in the art can reasonably conclude that the inventor had possession of the claimed invention 
at the time of filing the patent disclosure. My understanding of the enablement requirement is 
that the patent disclosure must contain sufficient information regarding the subject matter of the 
claims to enable one of ordinary skill in the pertinent art to make and use the claimed invention. 
I further understand that whether the enablement requirement is met depends on whether undue 
experimentation is necessary for one of skill in the art to practice the invention in light of the 
patent disclosure. 

3. Based on my review of the materials identified in paragraph 2 of this declaration, 
it is my opinion that with respect to the following limitations relating to claims 9, 18, 24, 25, 31, 
and 32 (as amended), the disclosures of the '840 patent and the '599 patent each indicate that the 
inventors were in possession of the claimed invention of the 10/757,516 patent application as of 
the August 16, 1995 filing date of the '840 patent and further as of the August 24, 1999 filing 
date of the '599 patent; and further the disclosures of the '840 patent and the '599 patent each 
would have enabled a person of ordinary skill in the art to make and use, without undue 
experimentation, the claimed invention of the 10/757,516 patent application as of the August 16, 
1995 filing date of the '840 patent, and further as of the August 24, 1999 filing date of the '599 
patent. The limitations referred to are: 

{ claim 9 } "wherein the execution unit is further operable to, in response to 
decoding a second single instruction specifying a register containing a first 
plurality of floating-point operands and another register containing a second 
plurality of floating-point operands, multiply the first plurality of floating-point 
operands by the second plurality of floating-point operands to produce a plurality 
of products and provide the plurality of products to partitioned fields of a result 
register as a catenated result" 

{ claim 1 8 } "wherein the execution unit is further operable to, in response to 
decoding a second single instruction specifying a register containing a first 
plurality of floating-point operands and another register containing a second 
plurality of floating-point operands, multiply the first plurality of floating-point 
operands by the second plurality of floating-point operands to produce a plurality 
of products and provide the plurality of products to partitioned fields of a result 
register as a catenated result" 

{claim 24} "wherein each of the first and second operands has a width of 64 
bits" 

{claim 25 } "wherein the execution unit is further capable of executing a 
plurality of different group floating-point arithmetic operations that arithmetically 
operate on multiple floating-point operands stored in partitioned fields of registers 
in the register file to produce a catenated result that is returned to a register in the 
register file, wherein the catenated result comprises a plurality of individual 
floating-point results" 

{claim 3 1 } "wherein each of the first and second operands has a width of 64 
bits" 
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{claim 32} "wherein the execution unit is further capable of executing a 
plurality of different group floating-point arithmetic operations that arithmetically 
operate on multiple floating-point operands stored in partitioned fields of registers 
in the register file to produce a catenated result that is returned to a register in the 
register file, wherein the catenated result comprises a plurality of individual 
floating-point results" 

Summary of '840 Analysis: 

4. The disclosure of the '840 patent provides detailed information and description 
that I believe indicates that the inventors were in possession of the aforementioned limitations of 
claim 9 (as amended) of the 10/757,516 patent application, and that I further believe would have 
enabled a person of ordinary skill in the art to make and use the claimed invention without undue 
experimentation. For example, on at least pages 19-21 (describing floating-point data formats), 
24-25 (describing general registers), 29 and 47-48 (describing floating-point arithmetic 
hardware), and 129-131 of the Terpsichore manual (describing details of Group Floating-point 
instructions such as various forms of Group Floating-point Multiply instructions) there are 
detailed descriptions of the aforementioned claim elements. 

5. The aforementioned limitations of claim 18 (a data processing system claim) are 
substantially similar to the aforementioned limitations of claim 9 (a programmable processor 
claim). Further, parent claim context providing antecedent basis for claim 18 (specifically "the 
execution unit") is substantially similar to corresponding parent claim context of claim 9. Thus, 
for at least the reasons described in paragraph 4 of this declaration, the disclosure of the '840 
patent provides detailed information and description that I believe indicates that the inventors 
were in possession of the aforementioned limitations of claim 18 (as amended) of the 10/757,516 
patent application, and that I further believe would have enabled a person of ordinary skill in the 
art to make and use the claimed invention without undue experimentation. 

6. The disclosure of the '840 patent provides detailed information and description 
that I believe indicates that the inventors were in possession of the aforementioned limitations of 
claim 24 (as amended) of the 10/757,516 patent application, and that I further believe would 
have enabled a person of ordinary skill in the art to make and use the claimed invention without 
undue experimentation. For example, on at least pages 24-25 (describing general registers), 26 
(generally describing store instructions), and 150-157 of the Terpsichore manual (describing 
details of Store and Store Immediate instructions such as various forms of Store Immediate and 
Store Multiplex Immediate instructions) there are detailed descriptions of the aforementioned 
claim elements. 

7. The disclosure of the '840 patent provides detailed information and description 
that I believe indicates that the inventors were in possession of the aforementioned limitations of 
claim 25 (as amended) of the 10/757,516 patent application, and that I further believe would 
have enabled a person of ordinary skill in the art to make and use the claimed invention without 
undue experimentation. For example, on at least pages 19-21 (describing floating-point data 
formats), 24-25 (describing general registers), 29 and 47-48 (describing floating-point arithmetic 
hardware), and 129-131 of the Terpsichore manual (describing details of Group Floating-point 



3/20 



Supplemental Declaration of Korbin S Van Dyke 



instructions such as various Group Floating-point Add, Divide, and Multiply forms) there are 
detailed descriptions of the aforementioned claim elements. 

8. The aforementioned limitations of claim 3 1 (a device claim) are substantially 
similar to the aforementioned limitations of claim 24 (a programmable processor claim). 
Further, parent claim context providing antecedent basis for claim 31 (specifically "the first and 
second operands") is substantially similar to corresponding parent claim context of claim 24. 
Thus, for at least the reasons described in paragraph 6 of this declaration, the disclosure of the 
'840 patent provides detailed information and description that I believe indicates that the 
inventors were in possession of the aforementioned limitations of claim 31 (as amended) of the 
10/757,516 patent application, and that I further believe would have enabled a person of ordinary 
skill in the art to make and use the claimed invention without undue experimentation. 

9. The aforementioned limitations of claim 32 (a device claim) are substantially 
similar to the aforementioned limitations of claim 25 (a programmable processor claim). 
Further, parent claim context providing antecedent basis for claim 32 (specifically "the execution 
unit") is substantially similar to corresponding parent claim context of claim 25. Thus, for at 
least the reasons described in paragraph 7 of this declaration, the disclosure of the '840 patent 
provides detailed information and description that I believe indicates that the inventors were in 
possession of the aforementioned limitations of claim 32 (as amended) of the 10/757,516 patent 
application, and that I further believe would have enabled a person of ordinary skill in the art to 
make and use the claimed invention without undue experimentation. 

Summary of '599 Analysis: 

10. The disclosure of the '599 patent provides detailed information and description 
that I believe indicates that the inventors were in possession of the aforementioned limitations of 
claim 9 (as amended) of the 10/757,516 patent application, and that I further believe would have 
enabled a person of ordinary skill in the art to make and use the claimed invention without undue 
experimentation. For example, on at least pages 14-16 (describing floating-point data formats), 
19-20 (describing general registers), 23-24 and 55 (describing floating-point arithmetic 
hardware), and 258-260 of the Zeus manual (describing details of Ensemble Floating-point 
instructions such as various forms of Ensemble Multiply Floating-point instructions) there are 
detailed descriptions of the aforementioned claim elements. Note that in the Zeus manual, group 
floating-point instructions are termed "ensemble" floating-point instructions. 

1 1 . The aforementioned limitations of claim 1 8 are substantially similar to the 
aforementioned limitations of claim 9. Thus, for at least the reasons described in paragraph 10 of 
this declaration, the disclosure of the '599 patent provides detailed information and description 
that I believe indicates that the inventors were in possession of the aforementioned limitations of 
claim 18 (as amended) of the 10/757,516 patent application, and that I further believe would 
have enabled a person of ordinary skill in the art to make and use the claimed invention without 
undue experimentation. 

12. The disclosure of the '599 patent provides detailed information and description 
that I believe indicates that the inventors were in possession of the aforementioned limitations of 
claim 24 (as amended) of the 10/757,516 patent application, and that I further believe would 
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have enabled a person of ordinary skill in the art to make and use the claimed invention without 
undue experimentation. For example, on at least pages 19-20 (describing general registers), 21 
(generally describing store instructions), 123-125, and 128-130 of the Zeus manual (describing 
details of Store and Store Immediate instructions, including Store Multiplex and Store Multiplex 
Immediate forms) there are detailed descriptions of the aforementioned claim elements. 

13. The disclosure of the '599 patent provides detailed information and description 
that I believe indicates that the inventors were in possession of the aforementioned limitations of 
claim 25 (as amended) of the 10/757,516 patent application, and that I further believe would 
have enabled a person of ordinary skill in the art to make and use the claimed invention without 
undue experimentation. For example, on at least pages 14-16 (describing floating-point data 
formats), 19-20 (describing general registers), 23-24 and 55 (describing floating-point arithmetic 
hardware), and 258-260 of the Zeus manual (describing details of Ensemble Floating-point 
instructions, such as various Ensemble Multiply, Add, and Divide forms) there are detailed 
descriptions of the aforementioned claim elements. 

14. The aforementioned limitations of claim 3 1 are substantially similar to the 
aforementioned limitations of claim 24. Thus, for at least the reasons described in paragraph 12 
of this declaration, the disclosure of the '599 patent provides detailed information and 
description that I believe indicates that the inventors were in possession of the aforementioned 
limitations of claim 31 (as amended) of the 10/757,516 patent application, and that I further 
believe would have enabled a person of ordinary skill in the art to make and use the claimed 
invention without undue experimentation. 

15. The aforementioned limitations of claim 32 are substantially similar to the 
aforementioned limitations of claim 25. Thus, for at least the reasons described in paragraph 13 
of this declaration, the disclosure of the '599 patent provides detailed information and 
description that I believe indicates that the inventors were in possession of the aforementioned 
limitations of claim 32 (as amended) of the 10/757,516 patent application, and that I further 
believe would have enabled a person of ordinary skill in the art to make and use the claimed 
invention without undue experimentation 

16. A detailed explanation of the basis for my opinions is set forth in the remainder of 
this declaration. 



Detailed Basis for My Opinions 

Analysis of the disclosures of the '840 and the '599 patents: 

17. For brevity, the following analysis focuses on and provides details relating to the 
'840 patent, while reciting summary information pointing out where similar descriptive 
information is provided in the '599 patent. 

18. As discussed in paragraph 20 of the initial Van Dyke declaration, the '840 patent 
describes structure of a general purpose, programmable media processor (including, for example, 
a register file and an execution unit), and the '840 patent recites that an instruction set for the 
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general purpose media processor is described by the Microfiche Appendix (referred to herein as 
the Terpsichore manual). In addition to elements discussed in paragraph 20 of the initial Van 
Dyke declaration, the '840 patent also describes that a unified stream of media data is processed 
by storage into the register file 110, and multi-precision arithmetic operations are performed on 
the media data. The operations include Boolean, integer, and floating-point mathematical 
operations (see '840, column 5, lines 47-53). Floating-point addition, subtraction, multiplication, 
division, and square root are supported in hardware ('840, column 15, lines 57-59). Similarly, 
the '599 patent describes structure of a general purpose, programmable processor for broadband 
applications, and the '599 patent includes and refers to a Microfiche Appendix (referred to herein 
as the Zeus manual) that describes, for example, an instruction set for the general purpose 
processor. 

19. The Terpsichore manual describes all of the aforementioned elements of claims 9, 
18, 24, 25, 31, and 32, on at least pages 19-21, 24-26, 29, 47-48, 129-131, and 150-157 (attached 
as Exhibit A). The Zeus manual describes all of the elements of claims 9, 18, 24, 25, 31, and 32, 
on at least pages 14-16, 19-21, 23-24, 55, 123-125, 128-130, and 258-260 (attached as Exhibit 
B). 

Claims 9 and 18 

20. The Terpsichore manual describes all of the aforementioned elements of claim 9 
(and the substantially similar aforementioned elements of claim 18), on at least pages 19-21, 24- 
25, 29, 47-48, and 129-131, describing Group Floating-point instructions such as various forms 
of Group Floating-point Multiply instructions. Paragraphs 21-27 of this declaration discuss 
selected portions of those pages. Paragraphs 28-32 of this declaration discuss how the elements 
of claim 9 (and the substantially similar aforementioned elements of claim 18) are described by 
those pages. The Zeus manual describes all of the elements of claim 9 (and the substantially 
similar aforementioned elements of claim 18), on at least pages 14-16, 19-20, 23-24, 55, and 
258-260. 
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21 . The '840 patent describes various floating-point data sizes such as 16, 32, 64, and 
128 bits (see '840, column 15, lines 62-65, and Fig. 9b, reproduced below). The Terpsichore 
manual, for example on pages 19-21, describes the various floating-point data sizes as designed 
to satisfy ANSI/IEEE standard 754-1985. Similarly, the Zeus manual, for example on pages 14- 
16, provides similar information. 
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22. The '840 patent provides description relating to floating-point hardware 
capabilities, such as in the Terpsichore manual on page 29, "operations supported in hardware 
are floating-point add, subtract, multiply, divide, and square root", and further on page 47, 
"partitioning favored for the initial implementation places all instructions that involving shifting 
and shuffling in one execution unit, and all instructions that involve multiplication, including 
fixed-point and floating-point multiply and add in another unit". Similarly, the Zeus manual, for 
example on pages 23-24 and 55, has similar descriptive information. 
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23. The Terpsichore manual describes several variations of Group Floating-point 
Multiply instructions, such as GF.MUL.16, GF.MUL.32, and GF.MUL.64, (among others) as 
described on pages 129-131, and reproduced below (excerpted and annotations added). 
Similarly, the Zeus manual, for example on pages 258-260, provides similar information. 

Group Fioating-point 

These operations take two values from registers, perform floating-point arithmetic 
on groups of bits in the operands, and place the concatenated results in a register. 



Option codes 



GF.MUL.16 


Group tfoatinq-point multiply half 


GF.MUL.16.C 


Group fioatinq-point multiply half ceiling 


GF.MUI.16.F 


Group floating-point multiply half floor 


GF.MUL.16.N 


Group floating-point multiply half nearest 


GF.MUL.16.T 


Group floating-point multiply half truncate 


GF.MUL.16.X 


Group floating-point multiply half exact 


GF.MUL.32 


Group floating-point multiply single 


GF.MUL.32.C 


Group floating-point multiply single ceiling 


GF.MUL.32. F 


Group floating-point multiply single floor 


GF.MUL.32.N 


Group floating-point multiply single nearest 


GF.MUL.32.T 


Group floating-point multiply single truncate 


GF.MUL.32.X 


Group floating-point multiply single exact 


GF.MUL.64 


Group floating-point multiply double 



24. The Terpsichore manual, on page 130, describes an instruction format for the 
Group Floating-point instructions (including Multiply forms as well as Add and Divide forms), 
reproduced below. Similarly, the Zeus manual, for example on page 260, provides similar 
information. 

f-'orma t 

GF. op. prec. round rc=ra.rb 

31 24 23 18 17 12 11 6 5 0 

| GF.prec { ra | rb \ rc | op. round j 

8 6 6 6 6 

The operands of the Group Floating-point instructions (such as Multiply forms) include 'ra', 
'rb', and 'rc'. As described in more detail in paragraphs 26(b)-26(c)ii of this declaration, 
contents of registers specified by the 'ra' and 'rb' operands are interpreted as respective 
collections of partitioned floating-point operands. The partitioned floating-point operands 
relating to 'ra' and 'r£>' are pairwise multiplied together, and the results are concatenated and 
then stored in a register specified by the 'rc' operand. 
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25. The Terpsichore manual, on pages 24-25, describes registers referenced as 
operands of some instructions (such as Group Floating-point instructions, including Multiply 
forms as well as Add and Divide forms), as reproduced below (with annotations illustrating 
examples of an 'ra' operand of '0' and an 'rb' operand of '62'). Similarly the Zeus manual, for 
example on pages 19-20, provides similar information. 

General Registers 

Terpsichore user state includes 64 general registers. All are identical; there is no 
dedicated zero valued register, and there are no dedicated floating-point registers. 

0 



jmm ~ 

64 

The forgoing registers are included in register file 110 of Fig. 7 of the '840 patent. 

26. The Terpsichore manual, on pages 130-131, describes a definition of various 
Group Floating-point instructions, including several Multiply forms. The definition is 
reproduced below, with annotations highlighting several elements that are discussed in the 
following sub-paragraphs concerning highlights of what one of ordinary skill in the art would 
understand from the description of the Terpsichore manual. Similarly, the Zeus manual, for 
example on page 260, provides similar information. 
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[2] *{ 



{a.<~ F 
b,<-F 



«-[3] 



Qslmiion 

del GroupFloalingPoinKop.prec.found.ca.fb.rc) as 
a <- RegReacff/a 128) 
b «- RegRead^b 128) 

0 ic 128-proc by ptec 
F(po-.a, t p fe c.i >) 
Ftp'SCU^prec-l ,) 
tl fOunO«NONE then 

if IsSignaflingNaN(ai) t isSignallingNaN(bi) 
raise RoatingPointException 

endif 

case op of 
F.DIV: 

if bi=0 then 

raise FloatingPointArithmetic 

endil 
others: 
endcase 

endif 

case op of 
GF.ADD: 

ci «- ai+bi 
[1] 6F.MUL: 
[51* ci <- ai'bi 
GF.DIV.: 
ci <- at/bi 

endcase 
' case op of 

GF.ADD. GF.MUL. GF.DIV: 
[6] ■*cj +prec . 1 ..i «- PackF(prec. ci) 
endcase 
endfor 
endcase 
case round of 
X: 
N: 
T. 
F: 
C: 

NONE: 
endcase 
if rco then 

raise Reservedlnstroction 

endif 

[7] ■*RegWrite(rc. 128. c) 
endcase 
enddef 

(a) The 'op' field of the instruction is decoded to distinguish a Multiply form (MUL) 
from an Add (ADD) or Divide (DIV) form, for example, as highlighted by 
annotation [1]. 

(b) Source operands are read from pairs of registers, as specified by the 'ra'and 'rb' 
operands, into variables 'a' and 'b\ respectively (see annotation [2]), such as 
including reading REG[0] into the least-significant 64 bits of 'a' and REG[1] into 
the most-significant 64 bits of 'a', when 'ra' is 0. 
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(c) A 'for' construct (see annotation [3]) specifies a number of evaluations of 

elements of the construct according to a 'prec' operand that specifies a (floating- 
point) precision to interpret the operands. 

i. For example, if the 'prec' operand is 16, then the 'for' construct is 
evaluated with eight values for variable T (0, 16, 32, 48, 64, 80, 96, and 
112), respectively. For another example, if the 'prec' operand is 64, then 
the construct is evaluated with two values for T (0 and 64), respectively. 

ii. Each evaluation begins by determining a partitioned floating-point value 
from each of the variables 'a' and 'b' (see annotation [4]) in accordance 
with the 'prec' operand. For example, if the 'prec 1 operand is 16, then for 
the evaluation where T is 0, a first partitioned floating-point value is 
determined from 'a' from the least-significant 16 bits of 'a', or l ais..o as 
identified by the expression l a iJrpr ec-i..i in the definition. Further in the 
evaluation where T is 0, a second floating-point value is determined from 
the least 16 bits of l b\ Continuing with the example, for the evaluation 
where T is 16, partitioned floating-point values are determined from the 
next-most-significant 16 bits of 'a' and l b', such that bits 31 to 16 
determine the floating-point values. Further continuing with the example, 
for the evaluation where T is 112, partitioned floating-point values are 
determined from the most significant 16 bits of 'a' and '6'. For another 
example of construct evaluations, if the 'prec' operand is 64, then the 'for' 
construct is evaluated with two values for T (0 and 64). Partitioned 
floating-point values are determined in the evaluation where T is 0 as the 
least-significant 64 bits of 'a' and 'o', and in the evaluation where T is 64 
as the most-significant 64 bits. 

iii. The '/or' construct processing continues by decoding a rounding mode 
and processing accordingly, and then decoding the 'op' field of the 
instruction (see previously discussed annotation [1]). 

iv. In the case of a Multiply form, the '/or' construct processing continues by 
multiplying the partitioned floating-point values from 'a' and 'o' by each 
other (see annotation [5]). The multiplying is in accordance with floating- 
point multiplying. 

v. The '/or' construct processing completes by writing the result of the 
multiplies into appropriate bits of destination variable 'c' (see annotation 
[6]) in accordance with the 'prec' operand. The appropriate bit locations 
of 'c' are identical to the bit locations of 'a' and 'o' that the partitioned 
floating-point values were determined from. For example, if the 'prec' 
operand is 16, then for the evaluation where T is 0, the least-significant 
16 bits (i.e. bits 15 to zero) of V are written, and for the evaluation where 
T is 16, the next-most-significant 16 bits (i.e. bits 31 to 16) of 'c' are 
written. For the evaluation where T is 1 12, the most-significant bits (i.e. 
bits 127 to 1 12) of V are written. 
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vi. Note that each evaluation of the 'for' construct is independent of the other 
evaluations, serving to operate on different unique and non-overlapping 
partitioned fields of the operands. Thus each evaluation is performable in 
parallel with the other evaluations, sequentially with the other evaluations, 
or any combination thereof. 

(d) After completion of the 'for' construct, processing completes by writing 'c' into a 
pair of registers, as specified by the 'rc' operand (see annotation [7]). 

One of ordinary skill in the art would readily understand that the computation of the floating- 
point multiplies would occur in ALU 102 of Fig. 7. 

27. Thus the Group Floating-point Multiply instructions are described in the 
Terpsichore manual (and also the Zeus manual) as interpreting contents of two source registers 
as respective pluralities of floating-point operands that are multiplied together, producing a 
plurality of products as results. The results are concatenated together and stored in a register 
specified by a third operand. 

28. At least the Terpsichore manual pages 19-21, 24-25, 29, 47-48, and 129-131 
describe all elements of claim 9 and substantially similar claim 18 (as amended), as described in 
more detail in paragraphs 29-32 of this declaration. Similarly, at least the Zeus manual pages 14- 
16, 19-20, 23-24, 55, and 258-260 describe all elements of claim 9 and substantially similar 
claim 18 (as amended). 

29. The element (of claim 9) wherein the execution unit is further operable, in 
response to decoding a second single instruction specifying a register containing a first plurality 
of floating-point operands and another register containing a second plurality of floating-point 
operands is described by the Terpsichore manual, for example as annotated and discussed by 
paragraphs 21-27 of this declaration. Each of the Group Floating-point Multiply instructions is 
an exemplary single instruction that specifies registers containing respective pluralities of 
floating-point operands, via, for example, the 'ra', and l rV operands. See paragraphs 26(b) and 
26(c)ii of this declaration for additional detailed discussion. As discussed in paragraph 22 of this 
declaration, an execution unit is clearly disclosed that is operable as claimed in this element of 
claim 9, as well as the other elements of claim 9 discussed in the following paragraphs. 

30. The element (execution unit operable to) multiply the first plurality of floating- 
point operands by the second plurality of floating-point operands to produce a plurality of 
products is described by the Terpsichore manual, for example as annotated and discussed in 
paragraphs 21-27 of this declaration. Each of the Group Floating-point Multiply instructions is 
operable to perform a floating-point multiply on operands from registers specified by 'ra' and 
'rb' . See paragraph 26(c)iv of this declaration for additional detailed discussion. 

3 1 . The element (execution unit operable to) provide the plurality of products to 
partitioned fields of a result register as a catenated result is described by the Terpsichore manual, 
for example as annotated and discussed in paragraphs 21-27 of this declaration. Each of the 
Group Floating-point Multiply instructions is operable to produce products into fields of bits of 
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result registers via, for example, the Vc' operand. See paragraphs 26(c)v and 26(d) of this 
declaration for additional detailed discussion. 

32. Thus every element of claim 9 and substantially similar claim 18 (as amended) 
are described at least by the Terpsichore manual on pages 19-21, 24-25, 29, 47-48, and 129-131. 
In addition, every element of claim 9 and substantially similar claim 18 are also described at 
least by the Zeus manual on pages 14-16 (describing floating-point data formats), pages 19-20 
(describing general registers), pages 23-24 and 55 (describing floating-point arithmetic 
hardware), and pages 258-260 (a definition for Ensemble Floating-point instructions, including 
Multiply forms). 



Claims 24 and 31 

33. The Terpsichore manual describes all of the aforementioned elements of claim 24 
(and the substantially similar aforementioned elements of claim 31), on at least pages 24-26 and 
150-157, describing Store Immediate instructions such as various forms of Store Multiplex 
Immediate instructions. The Zeus manual describes all of the elements of claim 24 (and the 
substantially similar aforementioned elements of claim 31), on at least pages 19-21, 123-125, and 
128-130. 

34. Claim 24 is dependent upon claim 19, and context associated with the element (of 
claim 24) the first and second operands is from claim 19, reproduced below (as amended): 

{ claim 19} A programmable processor comprising: 
a virtual memory addressing unit; 
an instruction path and a data path; 

an external interface operable to receive data from an external 
source and communicate the received data over the data path; 

a cache operable to retain data communicated between the external 
interface and the data path; 

a register file comprising a plurality of registers coupled to the data 
path; and an execution unit, coupled to the instruction and data paths, that is 
operable to decode and execute instructions received from the instruction path, the 
execution unit capable of performing a bitwise insert operation that operates on a 
first and a second operand stored in at least one register in the register file, 
wherein each bit in the second operand is individually selectable as either having 
a first predetermined value or a second predetermined value, wherein for each bit 
in the first operand, the bitwise insert operation inserts the bit into a 
corresponding bit position in a destination value if a corresponding bit in the 
second operand has the first predetermined value. 

35. As is described in more detail in paragraphs 36-39 of this declaration, the 
Terpsichore manual, on at least pages 24-26 and 150-157, describes all of the limitations of 
claim 19, in addition to claim 24, with respect to several variations of Store and Store Immediate 
instructions, including Store Multiplex and Store Multiplex Immediate forms. The initial Van 
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Dyke declaration, in paragraphs 22-28, discusses selected portions of those pages. Similarly, at 
least the Zeus manual pages 19-21, 123-125, and 128-130 describe all elements of claims 19 and 
24. 

36. The element of (claim 19) to decode and execute instructions received from the 
instruction path, the execution unit capable of performing a bitwise insert operation that operates 
on a first and a second operand stored in at least one register in the register file is described by 
the Terpsichore manual, as annotated and discussed in paragraphs 22-28 of the initial Van Dyke 
declaration. Each of the Store Multiplex Immediate instructions is a single instruction, as 
evidenced at least by the dedicated operation codes "S.MUX.64.B.A.I" and "S.MUX.64.L.A.I". 
Each of the Store Multiplex Immediate instructions is for performing a bitwise insert operation, 
since the combination of bit-wise logical-AND and bit-wise logical OR described for computing 
the store value results in insertion of a bit in place of another bit, e.g. "insertion" (see paragraph 
27 of the initial Van Dyke declaration). The claimed at least one register corresponds to the 
register pair identified by the 'rb' operand (see paragraph 26 of the initial Van Dyke declaration). 
The claimed first and second operands correspond respectively to the odd- and even-numbered 
registers of the register pair. 

37. The element (of claim 19) wherein each bit in the second operand is individually 
selectable as either having a first predetermined value or a second predetermined value, wherein 
for each bit in the first operand, the bitwise insert operation inserts the bit into a corresponding 
bit position in a destination value if a corresponding bit in the second operand has the first 
predetermined value is described by the Terpsichore manual, as annotated and discussed in 
paragraphs 22-28 of the initial Van Dyke declaration. As discussed in paragraph 27 of the initial 
Van Dyke declaration, a store value is determined by bit-wise multiplexing (e.g. "inserting") 
between a first data input and a second data input, based on a control input. The second data 
input is from memory. The first data input is the upper 64 bits of a value identified in the 
Terpsichore manual as > m\ and the control input is the lower 64 bits of 'm'. As discussed in 
paragraph 26 of the initial Van Dyke declaration, the upper 64 bits of 'm' are obtained from the 
odd-numbered register identified by 'rb', corresponding to the claimed first operand . Further the 
lower 64 bits of 'm' are obtained from the even-numbered register identified by 'rb', 
corresponding to the claimed second operand . 

38. Therefore, according to the discussion in paragraphs 36-37 of this declaration, the 
element (of claim 24) wherein each of the first and second operands has a width of 64 bits the 
first operand corresponds to the upper 64 bits of 'm' that are obtained from the odd-numbered 
register identified by 'rb'. The second operand corresponds to the lower 64 bits of 'm' that are 
obtained from the even-numbered register identified by . Thus both the first and the second 
operands are described as having a width of 64 bits. 

39. Thus every element of claim 24 is described at least by the Terpsichore manual on 
pages 24-26 and 150-157. In addition, every element of claim 24 is also described at least by the 
Zeus manual on pages 19-21, 123-125, and 128-130. 

40. Claim 31 is dependent upon claim 26, and context associated with the element (of 
claim 31) the first and second operands is from claim 26, reproduced below (as amended): 
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{ claim 26 } A device having installed therein a programmable processor, the 
programmable processor comprising: 

a virtual memory addressing unit; 

an instruction path and a data path; 

an external interface operable to receive data from an external 
source and communicate the received data over the data path; 

a cache operable to retain data communicated between the external 
interface and the data path; 

a register file comprising a plurality of registers coupled to the data 
path; and an execution unit, coupled to the instruction and data paths, that is 
operable to decode and execute instructions received from the instruction path, the 
execution unit capable of performing a bitwise insert operation that operates on a 
first and a second operand stored in at least one register in the register file, 
wherein each bit in the second operand is individually selectable as either having 
a first predetermined value or a second predetermined value, wherein for each bit 
in the first operand, the bitwise insert operation inserts the bit into a 
corresponding bit position in a destination value if a corresponding bit in the 
second operand has the first predetermined value. 

41 . Claim 26 (a device claim) is substantially similar to claim 19 (a programmable 
processor claim), and as previously discussed, claim 31 is substantially similar to claim 24. 
Therefore paragraphs 33-39 in this declaration concerning claim 24 and parent claim 19 are also 
applicable to claim 31 and parent claim 26. Thus every element of claim 31 is described at least 
by the Terpsichore manual on pages 24-26 and 150-157. In addition, every element of claim 31 
is also described at least by the Zeus manual on pages 19-21, 123-125, and 128-130. 

Claims 25 and 32 

42. The Terpsichore manual describes all of the aforementioned elements of claim 25 
(and the substantially similar aforementioned elements of claim 32), on at least pages 19-21, 24- 
25, 29, 47-48, and 129-131, describing Group Floating-point instructions such as various forms 
of Group Floating-point Add, Divide, and Multiply instructions. Paragraphs 43-50 of this 
declaration discuss selected portions of those pages. Paragraphs 51-52 of this declaration 
describe how the elements of claim 25 (and the substantially similar aforementioned elements of 
claim 32) are described by those pages. The Zeus manual describes all of the elements of claim 
25 (and the substantially similar aforementioned elements of claim 32), on at least pages 14-16, 
19-20, 23-24, 55, and 258-260. 

43. As discussed in more detail in paragraph 21 of this declaration, the '840 and the 
'599 patents describe various floating-point data sizes. 

44. As discussed in more detail in paragraph 22 of this declaration, the '840 and the 
'599 patents describe various floating-point hardware capabilities. 
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45. The Terpsichore manual describes several types and variations of Group Floating- 
point instructions, such as Group Floating-point Add (e.g.e GF.ADD.64), Divide (e.g. 
GF.DIV.32), and Multiply (e.g. GF.MUL.16) instructions, as described on pages 129-131, and 
reproduced below (excerpted and annotations added). Similarly, the Zeus manual, for example 
on pages 258-260, provides similar information. 

Group Floating-point 

These operations take two values from registers, perform floating-point arithmetic 
on groups of bits in the operands, and place the concatenated results in a register. 



Operation cod es 



GF.ADD.64 


Group floating-point add double 


GF.ADD.64 .C 


Group floating-point add double ceiling 


GF.ADD.64 .F 


Group floating-point add double floor 


GF.ADD.64 ,N 


Group floating-point add double nearest 


GF.ADD.64 .T 


Group floating-point add double truncate 


GF.ADD.64 .X 


Group floating-point add double exact 


GF.DIV.16 


Group floating-point divide half 


GF.DIV.16.C 


Group floating-point divide half ceiling 


GF.D1V.16.F 


Group floating-point divide half floor 


GF.DIV.16.N 


Group floating-point divide half nearest 


GF.DIV.16.T 


Group floating-point divide half truncate 


GF.DIV.16.X 


Group floating-point divide half exact 


GF.DIV.32 


Group floating-point divide single 


GF.DIV.32.C 


Group floating-point divide single ceilino 


GF.DIV.32.F 


Group floating-point divide single floor 


GF.DIV.32.N 


Group floating-point divide single nearest 


GF.DIV.32.T 


Group floating-point divide single truncate 


GF.DSV.32.X 


Group floating-point divide single exact 


GF.DIV.64 


Group floating-point divide double 


GF.DIV.64.C 


Group floating-point divide double ceiling 


GF.DIV.64.F 


Group floating-point divide double floor 


GF.DIV.64.N 


Group floating-point divide double nearest 


GF.DIV.64.T 


Group floating-point divide double truncate 


GF.DIV.64.X 


Group floating-point divide double exact 


-GF.MUL.16 


Group floating-point multiply half 


GF.MUL.16. C 


Group floating-point multiply half ceiling 


GF.MUL. 16.F 


Group floating-point multiply half floor 



46. As discussed in paragraph 24 of this declaration, the '840 and the '599 patents 
describe an instruction format for the Group Floating-point instructions (including Add, Divide, 
and Multiply forms). 
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47. As discussed in paragraph 25 of this declaration, the '840 and the '599 patents 
describe registers referenced as operands of some instructions (such as Group Floating-point 
Add, Divide, and Multiply instructions). 

48. As discussed in paragraph 26 of this declaration, the '840 patent and the '599 
patent describe definitions of various Group Floating-point instructions, including several 
Multiply forms. In addition, Add and Divide forms of Group Floating-point instructions are 
described by the '840 patent and the '599 patent, as evidenced by the following excerpt from 
page 131 of the Terpsichore manual: 

case op of 
GF.ADD: 

ci <- ai+bi 
GF.MUL: 

ci <— ai'bi 
GF.D1V.: 

ci <r- ai/bi 

endcase 

Similarly, the Zeus manual, for example on page 260, provides similar information. 

49. The discussion of Multiply forms of Group Floating-point instructions in sub- 
paragraphs 26(a)-26(d) of this declaration is generally applicable to Add and Divide forms, as 
well. Rather than multiplying the partitioned floating-point values (as discussed in sub- 
paragraph 26(c)iv of this declaration), the floating-point values are added (Add form) or divided 
(Divide form). 

50. Thus the Group Floating-point instructions of Add, Divide, and Multiply forms 
are described in the Terpsichore manual (and also the Zeus manual) as interpreting contents of 
two source registers as respective pluralities of floating-point operands that are added, divided, 
or multiplied together, respectively, producing a plurality of floating-point results that are 
concatenated together and stored in a register specified by a third operand. 

5 1 . The element (of claim 25) wherein the execution unit is further capable of 
executing a plurality of different group floating-point arithmetic operations that arithmetically 
operate on multiple floating-point operands stored in partitioned fields of registers in the register 
file to produce a catenated result that is returned to a register in the register file, wherein the 
catenated result comprises a plurality of individual floating-point results is described by the 
Terpsichore manual, for example as annotated and discussed in paragraphs 43-50 of this 
declaration. The Add, Divide, and Multiply forms of Group Floating-point instructions are 
exemplary instructions that embody arithmetic operations that operate on multiple floating-point 
operands, for example as specified by the 'ra', and Vfc' operands. Results of the arithmetic 
operations are returned, for example, to result registers specified by the 'rc' operand. As 
discussed in paragraph 22 of this declaration, an execution unit is clearly disclosed that is 
operable as claimed in claim 25. 

52. Thus every element of claim 25 and substantially similar claim 32 (as amended) 
are described at least by the Terpsichore manual on pages 19-21, 24-25, 29, 47-48, and 129-131. 
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In addition, every element of claim 25 and substantially similar claim 32 are also described at 
least by the Zeus manual on pages 14-16 (describing floating-point data formats), pages 19-20 
(describing general registers), pages 23-24 and 55 (describing floating-point arithmetic 
hardware), and pages 258-260 (a definition for Ensemble Floating-point instructions, including 
Add, Divide, and Multiply forms). 

Summary and Closing: 

53. The '840 patent, including the Terpsichore manual, provides sufficient 
information in sufficient detail describing the claimed invention (as amended) of the 10/757,516 
patent application, that one of ordinary skill in the art would reasonably conclude that the 
inventors had possession of the claimed invention at the time of filing the '840 patent. Further, 
the '840 patent, including the Terpsichore manual, provides sufficient information regarding the 
subject matter of the claimed invention (as amended) of the 10/757,516 patent application to 
enable one of ordinary skill in the pertinent art to make and use the claimed invention without 
undue experimentation. In addition, the '599 patent, including the Zeus manual, provides 
sufficient information in sufficient detail describing the claimed invention (as amended) of the 
10/757,516 patent application, that one of ordinary skill in the art would reasonably conclude 
that the inventors had possession of the claimed invention at the time of filing the '599 patent. 
Further, the '599 patent, including the Zeus manual, provides sufficient information regarding 
the subject matter of the claimed invention (as amended) of the 10/757,516 patent application to 
enable one of ordinary skill in the pertinent art to make and use the claimed invention without 
undue experimentation. 

54. Therefore, I believe that each of the '840 patent, including the Terpsichore 
manual, and the '599 patent, including the Zeus manual, provide adequate written description 
and enablement as required by 35 USC § 1 12 for the limitations of claims 9, 18, 24, 25, 31, and 
32 (as amended) of the 10/757,516 patent application, as discussed in paragraph 3 of this 
declaration. 

55. I have had no communication with any of the inventors of the 10/757,516 patent 
application (Craig Hansen and John Moussouris) relating to any material in this declaration. 

56. I have been hired as a consultant in connection with procedures before the United 
States Patent and Trademark Office (USPTO) regarding patents and patent applications assigned 
to Microunity Systems Engineering, Inc., including the media processor patent application. I am 
being compensated for my services at the rate of $325/hour. Other than acting as a consultant in 
connection with procedures before the USPTO, I have no interest or connection with Microunity 
Systems Engineering, Inc. 

57. During my evaluation of the media processor patent application, I have been 
impressed by the thoroughness and overall high-quality of the Terpsichore and Zeus manuals. 
The manuals provide clear and unambiguous descriptions of media processing systems and are 
thorough and well-written. The manuals provide comprehensive descriptions of instructions in 
complete architectural detail. The information in the manuals would have been readily 
understood and easily accessible to software engineers coding the media processing systems, and 
hardware engineers implementing microprocessors for use in the media processing systems, and 
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that is exactly what architecture reference manuals should be. This is not surprising, since the 
'840 patent and the '599 patent each include an architecture manual that is intended to enable 
hardware engineers to do exactly that - design, build, and implement a media processor that 
would include circuitry for the claim limitations set forth in paragraph 3 of this declaration, as 
described in the Terpsichore and the Zeus architecture manuals. 
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58. 1 hereby declare that all statements made herein are of my own knowledge are 
true and that all statements made on information and belief are believed to be true; and further 
that these statements were made with the knowledge that willful false statements and the like so 
made are punishable by fine or imprisonment, or both, under Section 1001 Title 18 of the United 
States Code and that such willful false statements may jeopardize the validity of the application 
or any patent issuing therefrom. 
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Fixed-point 

Terpsichore provides load and score instructions to move data between memory 
and the registers, branch instructions to compare the contents of registers and to 
transfer control from one code address to another, and arithmetic operations to 
perform computation on the contents of registers, returning the result to registers. 

g§§| jnad and Store 

The load and store instructions move data between memory and the registers. 
When loading data from memory into a register, values are zero-extended or sign- 
extended to fill the register. When storing data from a register into memory, 
values are truncated on the left to fit the specified memory region. 

Load and store instructions that specify a memory region of more than one byte 
may use either little-endian or big-endian byte ordering: the size and ordering are 
explicidy specified in the instruction. Regions larger than one byte may be either 
aligned to addresses that are an even multiple of the size of the region, or of 
unspecified alignment: alignment checking is also explicitly specified in the 



The load and score instructions are used for fixed-point data as well as floating- 
point and digital signal processing data; Terpsichore has a single bank of regiscers 
for ail data types. 

Swap instructions provide multithread and multiprocessor synchronization, using 
indivisable operations: add-and-swap, compare-and-swap, and multiplex-and- 
swap. A store-multiplex operation provides the ability to indivisably write to a 
portion of an octlet. These instructions always operate on aligned octlet data, using 
either little-endian or big-endian byte ordering. 

Branch Conditionally 

The fixed-point compare-and-branch instructions provide all arithmetic rests for 
equality and inequality of signed and unsigned fixed-point values. Tests are 
performed either between two operands contained in general registers, or on the 
bitwise and of two operands. Depending on the result of the compare, either a 
branch is taken, or not taken. A taken branch causes an immediate transfer of the 
program counter to the target of the branch, specified by a 12-bit signed offset 
from the location of the branch instruction. A non-taken branch causes no 
transfer; execution continues with the following instruction. 



I 



Other branch instructions provide for unconditional transfer of control to 
addresses too distant to be reached by a 12-bit offset, and to transfer to a target 
while placing the location following the branch into a register. The branch through 
gateway instruction provides A secure means to access code at a higher privilege 
level, in a form similar to a normal procedure call. 
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The operations explicitly specify th< . 
result to the specified precision at the concli 



Arithmetic Operations 

The operations supported in hardware are floating-point add, subtract, multiply, 
divide, and square root. Other operations required by the ANSI/IEEE floating- 
point standard are provided by software libraries. 

of the operation, and round the 
of each operation. 

A single instruction provides a floating-point multiply with the result fed into a 
floating-point add. The result is computed as if the multiply is performed to 
infinite precision, added as if in infinite precision, then rounded. This operation is 
a particularly good match to the needs of vector linear algebra routines. 

Rounding 

Rounding Is specified within the instructions explicitly, to avoid maintaining 
explicit state for a rounding mode. 

Exceptions 

All the mandated floating-point exception conditions cause a trap when they 
occur- maintenance of sticky and other status bits may be performed using 
software routines. Because the floating-point inexact exception may be very 
frequent, this exception only occurs when specified in the instruction explicitly. 
Arithmetic operations may also specify that all exceptions are to be handled by 
default, generating special results instead of traps. 



Digital Si gnal Processing 



The Terpsichore processor provides a set of operations that maintain the fullest 
possible use of 64- and 128-bit data paths whc.i operating on lower-precision 
fixed-point or floating-point vector values. These operations are useful for several 
application areas, including digital signal processing, image processing, and 
synthetic graphics. The basic goal of these operations is to accelerate the 
performance of algorithms that exhibit the following characteristics: 

Low-prec ixinn arithmetic 

The operands and intermediate results are fixed-point values represented in no 
greater than 64 bit precision. For floating-point arithmetic, operands and 
intermediate results are of 16, 32, or 64 bit precision. 

The use of fixed-point arithmetic permits various forms of operation reordering 
that are not permitted in floating-point arithmetic. Specifically, commutativity and 
associativity, and distribution identities can be used to reorder operations. 
Compilers can evaluate operations to determine what intermediate precision is 
required to get the specified arithmetic result. 
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branch predicts that a future execution of the same branch will be taken. More 
elaborate prediction may cache the source and target addresses of multiple 
branches, both conditional and unconditional, and both forward and reverse. 

The hardware prediction mechanism is tuned for optimizing conditional branches 
that close loops or express frequent alternatives, and will generally require 
substantially more cycles when executing conditional branches whose outcome is 
not predominately taken or not-taken. For such cases of unpredictable conditional 
results, the use of code which avoids conditional branches in favor of the use of set 
on compare and multiplex instructions may result in greater performance. 

Where the above technique may not be applicable, a Euterpe pipeline may ensure 
that conditional branches which have a small positive offset be handled as if the 
branch is always predicted to be not taken, with the recovery of a misprediction 
causing cancellation of the instructions which have already been issued but not 
completed which ( would be skipped over by the taken conditional branch. This 
"conditional-skip" optimization is performed by the Euterpe implementation and 
requires no specific architectural feature to access or implement. 

A Euterpe pipeline may also perform "branch-return' optimization, in which a 
branch-and-link instruction saves a branch target address which is used to predict 
the target of the next branch-register instruction. This optimization may be 
implemented with a depth of one (only one return address kept), or as a stack of 
finite depth, where a branch and link pushes onto the stack, and a branch-register 
pops from the stack. This optimization can eliminate the misprediction cost of 
simple procedure calls, as the calling branch is susceptible to hardware 
prediction, and the returning branch is predictable by the branch-return 
optimization. Like the conditional-skip optimization described above, this feature is 
performed by the Euterpe implementation and requires no specific architectural 
feature to access or implement. 

Additional Load and Execute Resources 

Studies of the dynamic distribution of Euterpe instructions on various benchmark 
suites indicate that the most frequently-issued instruction classes are load 
instructions and execute instructions. In a high-performance Euterpe 
implementation, it is advantageous to consider execution pipelines in which tht 
ability to target the machine resources toward issuing load and execute 
s is increased. 



One of die means to increase the ability to issue execute-class i 
provide the means to issue two execute instructions in a single-issue string. The 
execution unit actually requires several distinct resources, so by partitioning these 
resources, the issue capability can be increased without increasing the number of 
functional units, other than the increased register file read and write ports. The 
partitioning favored for the initial implementation places all instructions that 
involve shifting and shuffling in one execution unit, and all instructions that 
involve multiplication, including fixed-point and floating-point multiply and add in 
another unit. Resources used for implementing add, subtract, and bitwise logical 
operations may be duplicated, being modest in size compared to the shift and 
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multiply units, or shared between the two units, as the operations have low- 
enough latency that two operations might be pipelined within a single issue cycle. 
These instructions must generally be independent, except perhaps that two 
simple add, subtract, or bitwise logical may be performed dependently, if the 
resources for executing simple instructions are shared between the execution 
units. 

One of the me;ins to increase the ability to issue load-class instructions is to 
provide the means to issue two load instructions in a single-issue string. This 
would generally increase the resources required of the data fetch unit and the 
data cache, but a compensating solution is to steal the resources for the store 
instruction to execute the second load instruction. Thus, a single-issue string can 
then contain either two load instructions, or one load instruction and one store 
instruction, which uses the same register read ports and address computation 
resources as the basic 5-instruction string. This capability also may be employed to 
provide support for unaligned load and store instructions, where a single-issue 
string may contain as an alternative a single unaligned load or store instruction 
which uses the resources of the two load-class units in concert to accomplish the 
unaligned memory operation. 

Result 'Forwarding 

When temporally adjacent instructions are executed by seperate resources, the 
results of die first instruction must generally be forwarded directly to the resource 
used to execute the second instruction, where the result replaces a value which 
may have been fetched from a register file. Such forwarding paths use significant 
resources. A Terpsichore implementation must generally provide forwarding 
resources so that dependencies from earlier instructions within a string are 
immediately forwarded to later instructions, except between a first and second 
execution instruction as described above. In addition, when forwarding results 
from the execution units back to the data fetch unit, additional delay may be 
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Group Fi oatina - point 

These operations take two values from registers, perform floating-point arithmetic 
on groups of bits in the operands, and place the concatenated results in a register. 



Operation cod as 



GF.ADD.1e 


Group floating-point add half 


GF.ADD.16.C 


Group floating-point add half ceiling 


GF.ADD.16.F 


Group floating-point add naif floor 


GF.ADD.16.N 


Group floating-point add half nearest 


GF.ADD. 16. T 


Group floating-point add half truncate 


GF.ADD.16.X 


Group floating-point add half exact 


GF.ADD. 32 


Group floating-point add single 


GF.ADD. 32. C 


Group floating-point add single ceiling 


GF.ADD. 32. F 


Group floating-point add single floor 


GF.ADD. 32. N 


Group floating-point add single nearest 


GF.ADD.32.T 


Group floating-point add single truncate 


GF.ADD. 32.X 


Group floating-point add single exact 


GF.ADD. 64 


Group floating-point add double 


GF.ADD. 64 .C 


Group floating-point add double ceiling 


GF.ADD. 64 F 


Group floating-point add double floor 


GF.ADD.64 .N 


Group floating-point add double nearest 


GF.ADD. 64 .T 


Group floating-point add double truncate 


GF.ADD.64 .X 


Group floating-point add double exact 


GF.DIV. 16 


Group floating-point divide half 


GF.DIV.16.C 


Group floating-point divide half ceiling 


GF.DIV.16.F 


Group floating-point divide half floor 


or.uiv. io.iv 




GF.DIV.16.T 


Group floating-point divide half truncate 


GF.DIV.16.X 


Group floating-point divide half exact 


GF.DIV.32 


Group floating-point divide single 


GF.DIV.32.C 


Group floating-point divide single ceiling 


GF.D1V.32.F 


Group floating-point divide single floor 


GF.D1V.32.N 


Group floating-point divide single nearest 


GF.DIV.32.T 


Group floating-point divide single truncate 


GF.DIV.32.X 


Group floating-point divide single exact 


GF.DIV.64 


Group floating-point divide double 


GF.DIV.64.C 


Group floating-point divide double ceiling 


GF.DIV.64.F 


Group floating-point divide double floor 


GF.DIV.64.N 


Group floating-point divide double nearest 


GF.DIV.64.T 


Group floating-point divide double truncate 


GF.DIV.64.X 


Group floating-point divide double exact 


GF.MUL.16 


Group floating-point multiply half 


GF.MUL.16.C 


Group floating-point multiply half ceiling 


GF.MUL.16.F 


Group floating-point multiply half floor 



- 129 - microu nity 
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GF MUL 16 N 




Or.lVIUL. 1 D. 1 


Group fioating point multiply half truncate 


GF.MUL.16.X 


Group floating-point multiply half exact 


GF.MUL.32 


Group floating-point multiply single 


GF.MUL.32.C 


Group floating-point multiply single ceiling 


GF.MUL.32. F 


Group floating-point multiply single floor 


GF.MUL.32. N 


Group floating-point multiply sinyle nearest 


GF.MUL.32.T 


Group floating-point multiply single truncate 


GF.MUL.32. X 


Group floating-point multiply single exact 


GF.MUL.64 


Group floating-point multiply double 


GF.MUL.64.C 


Group floating-point multiply; double ceiling 


GF.MUL.64.F 


Group floating-point multiply double floor 


GF.MUL.64.N 


Group floating-point multiply double nearest 


GF.MUL.64.T 


Group floating-point multiply double truncate 


GF.MLlL64.X 


Group floating-point multiply double exact 





op 


prec 


round/trap 


add 


ADD 


16 32 64 128 


none C F N T X 


divide 


DIV 


16 32 64 128 


none C F N T X 


multiply 


MUL 


16 32 64 128 


none C F N T X 



f-'ormat 

GF. op. prec. round rc=ra.rb 
24 23 



GF.prec 



23 18 17 12 11 65 0 

| ra { rb | rc | op.rouncTj 



Description 
Th< 



and rb 



of registers 
operation. The result is placed in register 
specified rounding option or using 
option is specified, the operation ra 
invalid operation, divide by zero, o\ 
if the result is inexact. If 
exceptions are nor raised, ai: 
754. 

Definition 

del GroupFloatingPoinl(op.prec.round.ra.rb.rc) as 
a <- RegRead(ra. 128) 
b «- RegRead'rb 128) 
lor i <- 0 to 128-prec by prec 

ai<- Fr.p.-et.a^.prec-l ..) 

bi «- F(prsc.Ui t prec- 1 i) 

il round*NONE Ihen 



icd using the specified floating-point 
The operation is rounded using the 
:l-to-nearest if not specified. II a rounding 
uses a floating-point exception if a floating-point 
verflow, or underflow occurs, or when specified, 
inding option is not specified, floating-point 
handled according to the default rules of IEEE 
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Store 

These operations add the contents of two registers to produce a v 
and store the contents of a register into memory. 

Op eration cod es 



ual address, 



S.8 46 


Store byte 


S.16.B 


Store double big-endian 


S.16.B.A 


Store double big-endian aligned 


S.16.L 


Store double little-endian 


S.16.L.A 


Store double little-endian aligned 


S.32 B 


Store quadlet big-endian 


S.32.B.A 


Store guadlet big-endian aligned 


S.32.L 


Store quadlet little-endian 


S.32. LA 


Store quadlet little-endian aligned 


S.64.B 


Store octlet big-endian 


S.64.B.A 


Store octlet big-endian aligned 


S.64.L 


Store octlet little-endian 


S.64.L.A 


Store octlet little-endian aligned 


S.128.B 


Store hextet big-endian 


S.128.B.A 


Store hexlet big-endian aligned 


3.128.L 


Store hexlet little-endian 


S.128.LA 


Store hexlet little-endian aligned 


S.AAS.64.B.A 


Store add-and-swap octlet big-endian aligned 


S.AAS.64.L.A 


Store add-and-swap octlet little-endian aligned 


S.CAS.C4.B.A 


Store compare-and-swap octlet big-endian aligned 


S.CAS.64.L.A 


Store compare-and-swap octlet little-endian aligned 


S.MAS.64.B.A 


Store multiplex-and-swap octlet big-endian aligned 


S.MAS.64.L.A 


Store multiplex-and-swap octlet little-endian aligned 


S.MUX.64.B.A 


Store multiplex octlet big-endian aligned 


S.MUX.64.L.A 


Store multiplex octlet little-endian aligned 



size 


ordering 


alignment 


8 






16 32 64 128 


L B 




16 32 64 128 


L B 


A 



>t specify byte ordering, nor need it specify alignment checking, a 



13/20 



Supplementary Declaration of Korbin S Van Dyke - Exhibit A 




14/20 




15/20 



Supplementary Declaration of Korbin S Van Dyke -- Exhibit A 




Supplementary Declaration of Korbin S Van Dyke - Exhibit A 



Terpsichore System Architecture Wed. Aug 2, 1995 



Store Immediate 

These operations add the contents of a register to a sign -extended immediate 
value to produce a virtual address, and store the contents of a registerinto 
memory. 

Operation codes 







g' - n . . 


Store double biQ-endian aligned immediate 


"S '16 B 1 ' 


Store ^ ou ^ e riH" enCi i^ n 'cMmmediate 






Q1g[]'| 


Store double little endian fmm'ediate" 16 ' ' "~ 


S.32. B.A.I 


Store quadlet big-endiar. aligned immediate 


S.32.B.I 


Store quadlet big-endian immediate 


S.32.L.A.I . 


Store quadlet little-endian aligned immediate 


S.32.L.I 


Store quadlet little-endian immediate 


S.64. B.A.I 


Store octlet big-endian aligned immediate 


S.64.B.I 


Store octlet big-endian immediate 


S.64.L.A.I 


Store octlet little-endian aligned immediate 


S.64.U 


Store octlet little-endian immediate 


S.128.B.A.I 


Store hexlet big-endian aligned immediate 


S.128.B.I 


Store hexlet big-endian immediate 


S.128.L.A.I 


Store hexlet little-endian aligned immediate 


S.128.L.I 


Store hexlet little-endian immediate 


S.AAS.64.B.A.I 


Store add-and-swap octlet big-endian aligned immediate 


S.AAS.64. L.A.I 


Store add-and-swap octlet little-endian aligned Immediate 


S.CAS.64.B.A.I 


Store compare-and-swap octlet big-endian aligned immediate 


S.CAS.64. L.A.I 


Slore compare-and-swap octle: hule-endian aligned immediate 


S.MAS.64. B.A.I 


Store multiplex-and-swap octlet big-endian aligned immediate 


S.MAS.64. L.A.I 


Store multiplex-and-swap octlet little-endian aligned immediate 


S. MUX. 64. B.A.I 


Store multiplex octlet big-endian aligned immediate 


S. MUX. 64. L.A.I 


Store multiplex octlet little-endian aligned immediate 



size 


ordering 


alignment 


8 






16 32 64 128 


L B 




16 32 64 128 


L B 


A 



: specify ijvic ordering, i 



E (3 3 
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fit 



= fc r . . * 
1 r 



lit. ,l .gllf- ?l?f if 



:i:ff i : 

ml 



iff 

ST 



'If 

•Is 
?1 



I i! 
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